Supervised Classifier

نویسنده

  • Richard Cole
چکیده

Supervised Machine Learning is an important eld with many immediate applications. As a result, there is an increasing number of public domain tools with a diversity of learning approaches. However, very little work has been done to identify which public domain machine learning tools are \best" and on what kind of data. This research is a comparative study of di erent supervised public domainMachine Learning algorithms. It also includes the use of data analysis to explain the performance. Speci c characteristics of twelve Machine Learning classi ers are analyzed and their performance are compared on twenty nine UCI datasets. Data analysis using visualization and feature selection is also performed to con rm hypothesis about algorithm performance and explore interesting structure in the data. The experimental algorithms are categorized based on classi cation performance and reasons for di erences in classi er performance are discussed. It is shown that the variance of algorithms with respect to classi cation accuracy is due to the nature of experimental datasets and the way various algorithms react to the property/characteristics of the data. The di erences in classi cation performance also due to several other reasons: overtting, pruning techniques, and di culty in specifying optimal learning parameters. ii Acknowledgments There are a large number of people who have made valuable contributions to this project. First and foremost I would like to thank my supervisor Dr. Peter Eklund for his insightful guidance and strong leadership and also for encouraging and helping me to overcome all di culties in task ful llment as well as language barriers in writing this thesis. Thanks also to Richard Cole for his friendship and helping me in many ways. I wish to thank Peter Deer for his interests in the project and for his detailed and valuable comments about discussion writing style on rst drafts of this thesis. I am grateful to G. Klanniscek, M. Soklic and M. Zwitter of the UniversityMedical Center, Ljubljana for the use of the medical data and I. Kononenko for it conversion to a form suitable the induction algorithms. I am also grateful to David Aha (UCI) for the compilation and use of the UCI Repository of Machine Learning Databases. I am also grateful to P. Clark and T. Niblett, P. E. Utgo and C. E. Brodley, S. K. Murthy, T. Kohonel, T. W. Rauber, the programming team at the Institute for Parallel and Distributed High Performance System (the University of Stuttgart), the programming team at the Department of Internal Medicine, Electrical Engineering, and Computer Science (the University of Nevada) for creating wonderful software packages and granting free access to Internet users for academic use. As an important part of this acknowledgment, I would like to thank the Australian Government for granting me the postgraduate scholarship through the Australian Agency for International Development (AusAID), which made this study possible. I would like to thank all sta members of the Department of Computer Science, The University of Adelaide for providing me convenient working platform and for all the comprehensive knowledge about Computer Science I obtained. Thank also to all Honors/Masters students, to Andrew Burrow and Simon Pollitt, who o ered me friendship and helped me to overcome di culties in study and real life. iii

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection

Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...

متن کامل

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

یک چارچوب نیمه‌نظارتی مبتنی بر لغت‌نامه وفقی خودساخت جهت تحلیل نظرات فارسی

With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...

متن کامل

Support Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran

Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...

متن کامل

Semi-supervised Multiple Classifier Systems: Background and Research Directions

Multiple classifier systems have been originally proposed for supervised classification tasks. In the five editions of MCS workshop, most of the papers have dealt with design methods and applications of supervised multiple classifier systems. Recently, the use of multiple classifier systems has been extended to unsupervised classification tasks. Despite its practical relevance, semi-supervised ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007